Method and apparatus for increased performance of sequential I/O operations over busses of differing speeds

ABSTRACT

A method and system for retrieving large contiguous data files is provided. The method divides the data during the storage operation such that is spans two or more logical volumes. The data may be stored over the plurality of volumes in any convenient manner such as in the case of two logical volumes, storing the even address data on one device and the odd address data on the other device. During an access request (either a read or a write) a host controller receiving the access request will generate as many access requests as are associated with the storage of the data to associated storage controllers. The storage devices are coupled via separate busses to their respective storage controllers and thus several storage devices my communicate with their associated controllers simultaneously. The data may now be delivered to the host controller from the two or more storage devices simultaneously thus drastically improving data throughput. The host controller receives the data from the two or more storage controllers and re-assembles the data into a contiguous file and delivers it to the requesting device.

BACKGROUND OF THE INVENTION

[0001] This invention relates generally to storage systems associated with computer systems and more particularly to providing a method for improving the data throughput associated with a storage system during long sequential input/output (I/O) transactions.

[0002] As it is known in the art, computer systems generally include a central processing unit, a memory subsystem and a storage subsystem. According to a networked or enterprise model of a computer system, the storage subsystem associated with or in addition to a local computer system, may include a large number of independent storage devices or disks housed in a single enclosure. This array of storage devices is typically connected to several computers over a network. Such a model allows for the centralization of data which is to be shared among many users and also allows a single point of maintenance for the storage functions associated with computer systems.

[0003] One type of storage subsystem known in the art is one which includes a number of redundant disk storage devices configured as an array. Such a system is typically known as a RAID storage system. One of the advantages of a RAID type storage system is that it provides a massive amount of storage (typically in the several gigabyte range) and depending upon the RAID configuration may provide several differing levels of fault tolerance. Fault tolerance is typically achieved by providing, in addition to the disk devices that are used for storing the data, a disk device which is used to store parity data. The parity data may be used in combination with the remaining data on the other disk devices to reconstruct data associated with a failed disk device.

[0004] A disk storage system such as the RAID system described above will typically include one or more front end (or host) adapters/controllers which are responsible for receiving and processing requests from the various host devices which may be connected to the storage system. Additionally, a RAID storage system as described above may also include several disk adapters/controllers which are used to control the transactions between the disk storage devices and the host controller/adapter described above. Some storage systems may also include a very large buffer (e.g. a cache memory) for buffering the data transfers between the disk adapters and the host adapters.

[0005] In addition, a requirement of typical present day storage systems is that they present (or emulate) a particular storage geometry to the host computer. The geometry includes the configuration of the storage devices (i.e. number of cylinders, heads, sectors per track, etc.). The geometry presented to the host may not be the actual physical configuration of the storage devices in the system. As a result, some level of translation must be carried out between the emulated storage parameters and the physical storage parameters.

[0006] The problem associated with front-end emulation and back-end control is to relegate the two responsibilities to two separate processors. The front-end processor manages all that concerns operation of the front-end (host) controller. That is, it maintains information about the file system such as list of logical volumes and logical subdivisions such as cylinders, tracks, etc. The back-end processor is typically tasks which are transparent to the host such as data mirroring, data striping, RAID protection, concurrent copy and others. As such, the back-end processor is typically loaded with much more functionality than the front-end processor.

[0007] In a RAID storage system as described above, the storage system (front-end processor) may be connected to the host via a bus which has different physical characteristics (e.g., data throughput, contention time, etc.) than the bus which connects the physical devices to the storage controller (back-end processor). For example, the host adapter may be coupled to the network (i.e., host or requesting systems) via a so called wide SCSI bus operating at speeds which allow the transfer of data up to 20 megabits per second. The host adapter is then typically coupled to either a cache or directly to a disk adapter via a communication bus which allows for data transmission rates which are at least as fast as the wide SCSI bus. The disk adapters, however, may be connected to the associated disk storage devices by a so called narrow SCSI bus which runs at half the speed as a wide SCSI bus (i.e., data rates up to 10 megabits per second).

[0008] It will be appreciated by those of skill in the art that for the configuration described above, the transmission rate mismatch may result in a performance bottleneck during long sequential input/output activity. That is, during a read of large amounts of sequential data, the associated disk device will transmit its data to the disk controller at a rate which is half the speed at which the host controller can transmit the data to the requesting device. Thus the host adapter, and, as a result, the host device spends an inordinate amount of time waiting for data and thus wasting processing time. Similarly, when a host device needs to write a large amount of sequential data to a particular disk storage device, the host device will be able to transmit the data to the host adapter at a rate which is twice as fast at which the disk adapter can transmit the data to storage device. As a result the host devices and its associated bus would be stalled while it waited for the disk adapter to transmit the associated data to the disk

[0009] Two attempts at solving the above problem have included the use of pipelining techniques or so called prefetch mechanisms. During pipelining, for a read operation, the host adapter will be begin transferring the associated data to the requesting host device before the entire read has been satisfied from the disk adapter into the associated buffer or cache. The objective of this technique is to tune the data transfer such that data is transferred to the host device by the host adapter in blocks of data such that one block has been completely transferred to the host device just after the disk adapter has finished placing the next sequential block of data into the cache for transfer. A similar scheme is employed during writes of data with the transfer from data to the disk device from the host adapter.

[0010] The second approach which utilizes a prefetch scheme involves the use of caching algorithm which is designed to minimize the occurrences of cache misses. To do so, large amounts of data are prefetched from the disk storage device into the cache with the belief that the data prefetched will be that data which is requested by the host device and thus eliminate a second transaction from the disk adapter to the disk storage device to place the data in the cache for transfer to the host device. The drawback to the prefetching scheme is that its effectiveness depends crucially on the size of the cache and assumes that the disk adapter is not continuously busy fulfilling I/O requests.

[0011] It would be advantageous, therefore, to provide a system which allows long sequential I/O transactions between a disk storage device and a host device to occur without the usual bottlenecks associated with such a transfer.

SUMMARY OF THE INVENTION

[0012] In accordance with the present invention, a method of performing input/output operations between a requesting device and a responding device, where the responding device includes a plurality of storage devices, includes the following steps. In response to receipt of a write access request by a requesting device, where the write access is a write of a sequential block of data, storing portions of the block of sequential data on each of the plurality of storage devices. In response to receipt of a read access request by the requesting device for the previously stored block of sequential data, generating an access request to each of the storage devices on which the portions of data are stored. Retrieving each of the portions of the block of sequential data from each of the plurality of storage devices and assembling the retrieved portions into the original block of sequential data and transmitting the retrieved portions to the requesting device as a block of sequential data. With such an arrangement, data may be delivered to a requesting device at a constant rate without stalling even though the communications path to any one of the storage devices is slower than the communication rate of the requesting device.

[0013] In accordance with another aspect of the present invention, a storage system is provided which includes a plurality of storage devices each configured to store blocks of data. The storage system further includes a plurality of storage controllers where each storage controller is coupled respectively to one of the storage devices. Each of the storage controllers is operable to transmit and receive data to and from its corresponding storage device. The storage system further includes a request controller which receives from a requesting device an access request for a sequential block of data. The request controller is coupled to each of the storage controller and is responsive to the access request for generating an access request to each of the plurality of storage controllers having the portions of data associated with the requested sequential block stored thereon. With such an arrangement, a storage system is provided which allows for increased performance for transactions involving large amounts of sequential data since the data may be stored and subsequently retrieved from several individual devices rather than retrieved as a stream from a single device.

BRIEF DESCRIPTION OF THE DRAWINGS

[0014] The above and further advantages of the present invention may be better understood by referring to the following description taken into conjunction with the accompanying drawings in which:

[0015]FIG. 1 is a block diagram representation of a networked computer system;

[0016]FIG. 2 is a diagrammatic representation of the disk storage subsystem of the computer system of FIG. 1;

[0017]FIG. 3 is a diagrammatic representation of device ID tables associated with prior art bus controllers.

[0018]FIG. 4 is diagrammatic representation of device ID table s associated with prior the bus controllers and method for storing data according to the present invention.

[0019]FIG. 5 is a diagrammatic representation of the storage techniques associated with data stored on the disks of the storage system of FIG. 2 using the device ID tables of FIG. 4.

[0020]FIG. 6 is a diagrammatic representation of the storage techniques associated with data stored on the disks of the storage system of FIG. 2 and device ID tables of FIG. 4.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

[0021] Referring now to FIG. 1, network computer system I/O is shown to include inter alia, computers 11 and 12 a though 12 n coupled to server 13 via network or bus 15. Computers 11 and 12 a through 12 n may be any one of several well known types of computers such as personal computers or workstations. Server 13 is further coupled to storage system 14 thereby providing a link between individual computers 11 and 12 to storage system 14 via server 13. Like many network computer systems, server 13 typically provides the scheduling and routing of data between any of the individual computers 11 and 12 and disk storage subsystem 14 as well as between the individual computers 11 and 12 themselves. Although not shown, storage system 14 may also be connected to several servers similar to server 13 and thereby service several other network computer systems. Additionally, storage system 14 may be coupled directly to one or more individual computer systems rather than being connected to several computers through a server such as server 13.

[0022] Storage system 14 may include several individual storage devices such as magnetic disks as well as controller cards and cache memory all housed within a single enclosure. For purposes of discussion, the preferred embodiment will be described according to the notion that storage elements included within storage system 14 are magnetic disks. It should be noted however that the storage elements could be any storage medium which allows storage and retrieval of data (for example, random access memory or magnetic tape or optical disks) and thus should not be seen as a limitation of the present invention.

[0023] Referring now to FIG. 2, storage system 14 of FIG. 1 is shown in more detail to include among other things, a bus or host controller 22 coupled to a buffer or cache 24 which is further coupled to a plurality of disk controllers 26 a through 26 n. Each of the controllers 26 a-26 n is further coupled respectively storage devices 28 a through 28 n. Although not shown in the figure, each controller 26 a-26 n may control a plurality of physical storage devices. According to the preferred embodiment, here storage devices 28 a through 28 n are magnetic disk devices. Storage system 14 communicates or transfers data to and from requesting devices via bus 15. Requests by one of the computers connected via server 13 to storage subsystem 14 will be received by a bus controller 22 and, as will be described in further detail below, will provide the appropriate request to the individual disk controller or disk controllers corresponding to the storage devices which contain the data to be read or are the target device of data to be written.

[0024] According to a preferred embodiment of the present invention, the manner in which the host controller determines which storage device to generate access request is performed using a data structure here called a device ID table. Referring now to FIG. 3, one example of a device ID tables is shown. Each logical device has an associated device ID table (for example table 44) which includes a plurality of track ID tables 52 which further include information used to locate the track in cache memory.

[0025] Referring now to FIG. 4 and according to the present invention, a device ID table structure is shown as having a hierarchical structure and including a linear succession of track ID tables which are grouped into cylinders to provide a cross reference to the disk devices. As shown, there is a device ID table for each logical device of the system. In addition, a device ID table (e.g. table associated with device 64) is broken down into (or includes) two partial device ID tables 68 and 69. Each of the partial tables further includes a plurality of track ID tables. Here each of the track ID tables accommodates 40H blocks of data. The track ID tables include among other things pointers into the cache memory 24 where a particular track associated with one of the storage devices is located should the track be currently stored in the cache. The logical blocks of data may be mapped contiguously onto the track ID table.

[0026] The individual disk controllers 26 a-26 n (FIG. 2) each maintain a separate cross reference table (not shown) that relates a particular track in a particular logical volume to its locations on a physical storage device. The mapping from the track ID table into the cylinder head sector structure for a disk storage device is contiguous as well. Generally, each logical device may be mapped by a disk controller into a part of a single physical storage device (or disk). When an I/O request arrives at the bus controller, the corresponding track ID table is identified by the bus controller through a direct calculation from which it identifies the disk controllers which will be responsible for handling the I/O request. Consequently the bus controller posts a read or write request to each of the identified disk controllers' associated request buffer (not shown) which is identified in the device ID table. During normal operation the disk controllers poll their corresponding request buffers and fulfill I/O requests accordingly.

[0027] As will be described in more detail below, rather than store sequential data contiguously on a single disk, the present invention interleaves portions of logically sequential data over two or more disks. It would be expected that the task of interleaving data between physical disks should be handled by the disk controllers. This is the process which would need to be followed using the device ID table structure of FIG. 3. However, according to the present invention, the bus controller 22 is operable to provide the interleaving of data across many physical disks. This is accomplished in part by providing an increased number of device ID tables, as described above, where each logical device is actually represented by, what would appear from the disk controllers' perspective, two or more devices, each of which houses even and odd numbered tracks (in the case of a 2:1 interleave ratio). Using the device ID table structure described above, the bus controller takes on the responsibility for posting a sequential I/O requests to the corresponding splits (disks) of a logical volume in a manner which is completely transparent to the disk controllers. This method of interleaving data on a storage system simplifies the complexity of design for the disk controllers since functionality which would ordinarily require the addition of further complexity to already complex disk controller software is now achieved through the intelligent use of elaborate device ID tables in the bus controller.

[0028] According to the preferred embodiment, in order for the host adapter to generate access requests to the disk devices holding (or being the destination of) the associated data, the additional device ID tables described above will be provided (the number dictated by the number of disks over which the data is spread). For example, assuming that a sequential block of data was split between two disk devices, there would be two device ID tables, one corresponding to the even track ID tables, and one corresponding to the odd track ID tables (see FIG. 4).

[0029] According to preferred embodiment of the present invention, bus 15 which connects the bus controller to the external devices such as server 13 is a so called wide SCSI bus. A wide SCSI bus is typically 16 bits in width and can transmit up to 20 megabits of data per second. Further in accordance with a preferred embodiment of the present invention, busses 27 a through 27 n which couple disk controllers 26 a through 26 n respectively to disk devices 28 a through 28 n are narrow SCSI busses which are typically 8 bits in width and can transmit data at a rate of up to 10 megabits per second.

[0030] As described earlier, the mismatch in speeds between bus 27 a through 27 n and bus 15 may cause a condition where any one of the host devices may be stalled waiting for data to be delivered from one of the disk devices or stalled waiting to be able to write data to one of the disk devices. This is a result of the host device being able to transmit at a speed which is here twice as fast as the speed at which the disk controller can communicate with the storage device. In accordance with the present invention, a method is provided which allows the disk devices to satisfy the data access request from any one of the host devices without experiencing the stall condition previously described.

[0031] Referring now to FIG. 5, an exemplary one of disks 28 a through 28 n of FIG. 2 is shown. Data is typically stored on disk 28 n which may contain one or several logical volumes where a logical volume is a means to identify a particular section of storage device for disk 28 n and thus allow access to the data stored within that logical volume. The logical volumes of disk 28 n are further divided into a number of tracks, here track 0 through track n, where the tracks are further divided into sectors 32. Finally, the sectors are divided into sets of blocks 34 where here set of blocks 34 contains eight blocks labeled block 0 through block 7. It should be understood that a contiguous section of data might typically span several blocks in several sectors as well as several tracks. As shown in the figure, a sequential data file is shown to include blocks data1 through data15 and is stored sequentially in blocks 0-7 of each sector 34 and 35. One example of a large amount of sequential or contiguous data is a video or movie or audio or some other large amount of data which is typically delivered in a stream or contiguous format. The format shown for the storage of data1-data15 is typical of prior art systems. To achieve this type of data arrangement on the physical drives, the device ID tables of FIG. 3 which are typically would be used to generated the access request. That is, this scheme results in the data being stored on a single logical volume thereby requiring that the delivery of the data occur to over a single bus, such as bus 27, to a single disk controller, ultimately to the requesting host device.

[0032] According to the present invention, rather than store the contiguous blocks of data as described in connection with FIG. 5 on a single logical volume and thereby require communication over a single bus, the contiguous data is here distributed over two or more logical volumes associated with different disk controllers thus allowing for the retrieval of data from the storage device 14 at a faster rate.

[0033] Referring now to FIG. 6, disk 28 a and 28 b of storage system 14 (FIG. 1) are shown to have the same structure of data storage as a disk associated with FIG. 5 down to and including the sector segmentation of the disk. However, unlike the arrangement as shown in FIG. 3, data will be stored according to the present invention such that the even address blocks of data for a large contiguous file will be stored on disk 28 a (controlled by controller 26 a (FIG. 2)) while the odd blocks of a large section of contiguous data will be stored on disk 28 b (controlled by controller 26 b (FIG. 2)). This arrangement of data results in the even blocks of the contiguous data file being stored in sector 36 a of disk 28 a while the odd blocks of the contiguous data file are stored in sector 36 b of disk 28 b.

[0034] Referring back to FIG. 2, it should be noted that each of the disk controllers associated with the respective disk devices can communicate with their associated disk devices simultaneously. Thus, when a host device requests an access to a large contiguous section of data, for example, during a read transaction, bus controller 22 rather than initiating a single access request to a single controller, will here, according to the present invention, generate an access request to each of the disk controllers 26 a and 26 b which will then each simultaneously retrieve the associated even and odd blocks of data associated with the access request. It should be understood that disk controllers 26 a and 26 b do not need to have any knowledge of the arrangement of the data stored on the respective disk and thus can respond to a request as previously done. A request to each of the controllers will merely include a starting address and ending address of the required data and disk controllers will access their disk devices and deliver the data through buffer 24 ultimately to bus controller 22. As described above, the tracking of which device stores which sections of data done through the device ID tables and associated track ID tables.

[0035] In order for the host devices to receive the requested data in the required format, bus controller 22 assembles the data as it receives it from the disk controllers 26 a and 26 b and re-assembles the even and odd blocks to provide the correct contiguous section of data. As can be seen then, since controllers 26 a and 26 b can communicate simultaneously with disks their associated disks 28 a and 28 b, data can be delivered at twice the speed as would be achieved if data were only being delivered from or to a single device (disk controller). Therefore, the host devices will not experience a bottleneck or a hold-off condition waiting for data to be delivered from or delivered to the disk devices of storage system 14. The above description made the assumption that only two disks or arrays of disks (i.e. communications with two controller) were used for the storage of a large contiguous section of data. Thus the even blocks of data were shown to be stored on one disk(s) while the odd blocks of data were shown to be stored on a second disk(s). It should be understood, however, that this concept is extensible to disks associated with several controllers and is only limited by the number of controllers (and associated disks) available and the data bandwidth required to satisfy the speed of the host devices. That is, for example, if data need to be delivered at four time the speed of busses 27 a through 27 n then four disks (or arrays of disks) could be used to store a large section of contiguous data with every fourth block being stored respectively on every fourth disk. Thus a request for the contiguous block of data received at bus controller 22 would cause bus controller 22 to generate four access request to each of the respective four disk controllers associated with the disks containing the every fourth block of contiguous data. Upon delivery of the data from the disks to the bus controller, the bus controller would reassemble the data received from the four disk devices and deliver it to the host or requesting devices a single contiguous block of data.

[0036] Having described a preferred embodiment of the present invention, it will now become apparent to those of skill in the art that other embodiments incorporating its concepts may be provided. It is felt therefore that this invention should not be limited to the disclosed embodiment but rather should be limited only by the spirit and scope of the appended claims. 

What is claimed is:
 1. A method of performing input/output (I/O) operations between a requesting device and a responding device, said responding device includes a plurality of storage devices, said method comprising the steps of: in response to receipt a write access request from said requesting device, wherein said write access request includes a sequential block of data for storage, storing a portion of said block of sequential data on each of said plurality of storage devices; in response to receipt of a read access request by said requesting device for said block of sequential data, generating an access request to each of said storage devices; substantially simultaneously retrieving each of said portions of said block of sequential data from each of said plurality of storage devices; and assembling said retrieved portions into said block of sequential data; transmitting said retrieved portions to said requesting device as said block of sequential data.
 2. A method of performing input/output (I/O) operations between a requesting device and a responding device wherein said requesting device is coupled, via a first bus, to an associated request controller, said responding device includes a plurality of storage devices each coupled to associated storage controllers over second respective busses, said method comprising the steps of: storing a portion of a block of sequential data on each of said plurality of separate storage devices; in response to an access request by said requesting device for said block of sequential data, generating, by said request controller, an access request each of said storage controllers associate with said storage devices storing said portions of said sequential block of data; substantially simultaneously retrieving each of said portions by each of said associated storage controllers; transmitting said retrieved portions to said request controller; re-assembling said portions into said sequential block of data; and transmitting said sequential block of data to said requesting device.
 3. A method of performing input/output (I/O) operations between a requesting device and a responding device wherein said requesting device is coupled, via a first bus, to and associated request controller, said responding device includes a plurality of storage devices each coupled to associated storage controllers over second respective busses and where a block of sequential data is stored in portions over the plurality of storage devices, said method comprising the steps of: in response to an access request by said requesting device for said block of sequential data, generating, by said request controller, an access request to each of said storage controllers associate with said storage devices having said portions of said sequential block of data stored thereon; substantially simultaneously retrieving each of said portions by each of said associated storage controllers; transmitting said retrieved portions to said request controller; re-assembling said portions into said sequential block of data; and transmitting said sequential block of data to said requesting device.
 4. An apparatus comprising: a plurality of storage devices, each of said storage devices having a storage controller coupled thereto, said storage devices arranged to store portions of sequential blocks of data; a request controller responsive to a request for a sequential block of data for generating a plurality of access requests to ones of said storage controllers associated with ones of said storage devices having portions of said requested block of data stored thereon, said request controller operable to receive said portions of said requested block from said storage controllers and to transmit said received portions as said sequential block of data.
 5. A storage system comprising: a plurality of storage devices configured to store portions of a sequential block of data; a plurality of storage controllers coupled respectively to said plurality of storage devices, each said storage controllers operable to transmit and receive data to and from a respective one of said storage devices; a request controller for receiving from a requesting device an access request for said sequential block of data; said request controller coupled to each of said storage controllers and responsive to said access request for generating a plurality of access requests to said plurality of storage controllers;
 5. The storage system as in claim 4 wherein said access request is a write request and said request controller is operable in response thereto for generating a plurality of write requests to said plurality of storage controllers and for transmitting to said plurality of storage controllers corresponding portions of said sequential block for storage on associated ones of said storage devices.
 6. The storage system as in claim 4 wherein said access request is a read request and said request controller is operable in response thereto for generating a plurality of read requests to said plurality of storage controllers to retrieve from said storage devices all of said portions corresponding to said sequential block of data, said request controller further operable to assemble said retrieved portions into said sequential block of data and transmit said sequential block of data to said requesting device.
 7. The storage system as in claim 4 wherein each one of said storage controllers is coupled to a separate one of said plurality of storage devices via a separate bus and wherein all of said storage controllers can communicate with respective ones of said storage devices substantially simultaneously. 